Building Large Language Models

AI
LLM
programming

Rohan Paul Substack is a detailed resource of in-depth explanations about LLMs and how to build them.

Run your own LLM

Lamini: a tailored (private) LLM engine

Jay Alammar: The Illustrated GPT-2 (Visualizing Transformer Language Models)

Large Language Model Course is divided into three parts:

  1. 🧩 LLM Fundamentals covers essential knowledge about mathematics, Python, and neural networks.
  2. 🧑‍🔬 The LLM Scientist focuses on learning how to build the best possible LLMs using the latest techniques
  3. 👷 The LLM Engineer focuses on how to create LLM-based solutions and deploy them.

Early Access Manning book on Build a Large Language Model (From Scratch) Github: @raspt

Transformers from Scratch Matt Diller’s step-by-step, with Colab ## Examples

TI-84 GPT4All and Youtube

How to add custom GPTs to any website in minutes.


Libraries

Run a variety of LLMs locally using Ollama

To run Llama 3 locally, Download Ollama and run llama3:

ollama run llama3

Oolama main site

Ollama models supported

Collecting Data

Scripts to convert Libgen to txt (see also Explaining LLMs) ## Technical Details

from my question to Metaphor.systems

https://jalammar.github.io/illustrated-transformer/ https://huggingface.co/

Google’s free BERT model, a small-sized model for language

OpenAI API

GPT in 60 lines of NumPy via HN

How to Build

The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities

Understanding LLMs: A Comprehensive Overview from Training to Inference

2023 summary from Simon Willison: a good list of resources for how to build your own LLM.

step-by-step tutorial for How to build LLama3 from scratch with code and diagrams

How to build LLama from scratch

The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI

deep dive into the viral Transformers Math 101 article and high-performance distributed training for Transformers-based architectures.

An observation on Generalization: 1 hr talk by Ilya Sutskever, OpenAI’s Chief scientist. He’s previously talked about how compression may be all you need for intelligence. In this lecture, he builds on the ideas of Kolmogorov complexity and how neural networks are implicitly seeking for simplicity in the representations that they learn. He provides a clarity of thought that is rarely seen in the industry around generalization of these novel systems.

Brendan Bycroft wrote a well-done step-by-step visualization of how an LLM works

Welcome to the walkthrough of the GPT large language model! Here we’ll explore the model nano-gpt, with a mere 85,000 parameters.

Its goal is a simple one: take a sequence of six letters:

C B A B B C and sort them in alphabetical order, i.e. to “ABBBCC”.

Bycroft’s Visual Step-by-Step Description of an LLM

Operations

How Meta Builds its LLMs

This blog post from Meta outlines the infrastructure being used to train Llama 3. It talks through storage, networking, Pytorch, NCCL, and other improvements. This will lay the foundation for Meta’s H100s coming online throughout the rest of this year.

Quality Control

Use LLMs to judge quality of other LLM output